I'm creating a custom env to use with DQN. The env consist on a grid (30 x30) and the agent tasks consist on placing an object in any of the grid points. So the action space will be large (900 actions).
When placing the objects, the agent can be in different fixed positions (outside the grid), for example 50 positions. It will start at the position 0 and end at the position 49 at the end of the episode. I need to keep track the current position (state) and the positions of the places objects. For that I'm using tensor of 0s and 1s.
When the agent places an object, it get connected with a line to the position of the robot. After this, the agent places another object from other position. If the new line intersect any of the previous line, it gets a negative reward,
For example, let's say I have a 2x2 grid and the agent positions are fixed to 2 positions.
The logic is:
first_step_positions = [1, 0] # The agent is in the first position placed_objects = [0, 0, 0, 0] # This is representing the grid positions [(0,0), (1, 0), (0, 1), (1,1)] and are all zeros because we don't have any objects places yet. first_obs = [1, 0, 0, 0, 0, 0] # positions + placed_objects to pass into the NN # The agent place and object at the (1,1) coordinate second_step_positions = [0, 1] placed_objects = [0, 0, 0, 1] # The last value is 1 because it has an object placed second_obs = [0, 1, 0, 0, 0, 1] # positions + placed_objects to pass into the NN
The idea is to pass these obervations into the neural network, but I'm concern about the number of features that they have, because I will need to make the hidden layers much bigger.
I'm still a RL beginner. I'm sure there must be another, more efficient way to do this.
Thanks!
submitted by /u/Pipiyedu
[link] [comments]
( 90
min )